LM Studio: Exploring AI Models from Your Desktop

LM Studio is a desktop app (Mac, Windows, Linux) that downloads and runs local LLMs with a polished UI. No terminal, no complicated setup: open, pick model, chat. For exploratory developers, data analysts, journalists with sensitive data, and anyone wanting to try LLMs without sending queries to the cloud.

This article covers what it offers, when it’s better than Ollama or OpenWebUI, and where it has limits.

What LM Studio Does

Main features:

Model download from Hugging Face with one click.
Local execution over llama.cpp (under the hood).
Polished chat UI.
Local OpenAI-compatible API that other apps can consume.
RAG with your documents (PDF, TXT, DOCX) — chat with your files.
Saved prompt management.
Side-by-side model comparison.

All in a desktop binary, no terminal, no YAML config.

Installation

Download from lmstudio.ai. DMG for Mac, MSI for Windows, AppImage for Linux. Open.

First time asks to select a model. Recommended to start:

Mac Apple Silicon: Llama 3 8B Q4_K_M (~5GB) or Phi-3 Mini (3GB).
PC with 16GB RAM: Mistral 7B Q4 (~4GB) or Phi-3.
PC with 32GB+ RAM: Mixtral 8x7B Q4 (~25GB) or quantised Llama 3 70B (~40GB).

Download and load, ready to chat.

Usage Experience

For a non-technical user:

UI with model selector at start.
Chat with visual parameters (temperature, top_p, context length).
File upload for local RAG.
Export/import conversations.
Pre-configured prompt templates for common cases.

For a developer:

API server at localhost:1234 OpenAI-compatible.
Multiple models loaded simultaneously.
Logs of each query and tokens consumed.
GPU offloading configurable (CPU+GPU hybrid).

OpenAI-Compatible API

An underrated feature: LM Studio exposes an OpenAI-compatible API. Your existing code works:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="local-model",  # ignored, LM Studio uses loaded
    messages=[{"role": "user", "content": "Hi"}]
)

Useful for offline development, privacy-sensitive apps, or as fallback if OpenAI falls.

Local RAG with Your Documents

LM Studio integrates ingestion and RAG:

Drag PDFs/docs to the chat.
System extracts text, generates local embeddings.
Chat uses relevant context from your docs.

For lawyers, doctors, journalists with confidential data: zero cloud exposure. Document store stays local.

Hardware and Performance

On Apple Silicon M2/M3:

Llama 3 8B Q4: 30-50 tokens/s on M2 Pro.
Mistral 7B Q4: similar.
Mixtral 8x7B Q4: 15-25 tokens/s on M3 Max 64GB.
Llama 3 70B Q4: 5-10 tokens/s if it fits unified memory.

On Windows with NVIDIA GPU:

RTX 4090: Llama 3 70B Q4 at ~15 tokens/s.
RTX 4070/4080: 7B-13B are sweet spot.
Laptop with 3050/4050: limited, better CPU inference.

CPU-only is viable for small models (3B) with slower but usable responses.

LM Studio vs Ollama

Honest comparison:

Aspect	LM Studio	Ollama
UI	Rich desktop	Minimal (CLI + optional web)
Installation	DMG/MSI install	CLI binary
Models	Direct Hugging Face	Own registry + GGUF
API	OpenAI-compat	OpenAI-compat
Built-in RAG	Yes	Via OpenWebUI
Multi-model loading	Yes	Yes
Linux	AppImage (beta)	Mature native
Target audience	Non-tech users + devs	Devs
License	Closed (free)	Open MIT

LM Studio wins for non-technical-user UX. Ollama wins for dev/CLI stack integration and open-source.

LM Studio vs OpenWebUI

OpenWebUI is a web UI for Ollama/other LLM backends.

Aspect	LM Studio	OpenWebUI + Ollama
Deploy	Local desktop app	Docker container
Multi-user	No (single-user)	Yes
UI quality	Excellent	Very good
Self-hosted	Per user	For team
Open-source	No	Yes

LM Studio is personal / single-user. OpenWebUI is team / multi-user self-hosted.

Real Use Cases

Where we see LM Studio:

Developers testing models before deploy.
Data scientists iterating with LLMs without cloud.
Journalists and lawyers with confidential documents.
Students learning about LLMs without spending on APIs.
Small companies with laptop fleets and strict compliance.

Where it doesn’t fit:

Production servers (use Ollama/vLLM).
Simultaneous multi-user (use OpenWebUI).
Scaling with multiple concurrent sessions.
Non-GUI environments (SSH-only servers).

Limitations

Honestly:

Closed-source (not OSS), though free. Potential lock-in.
Update cadence depends on LM Studio team.
Not easily integrable into CI pipelines.
Single-machine: doesn’t distribute inference.
Optional telemetry but worth verifying settings.

Performance Tuning

Three key tunings:

GPU layers: how many model layers go to GPU. More = fast but needs VRAM.
Context length: max tokens. Lower = faster + less memory.
Thread count: for CPU inference, match physical cores (not HT logical).

Play with these until finding your hardware’s speed/memory balance.

Recommended Models to Start

For Apple Silicon M2/M3:

General chat: Llama 3 8B Instruct Q4_K_M.
Code: DeepSeek Coder 6.7B Q4.
Spanish: Mixtral 8x7B if it fits.
Reasoning: Phi-3 Medium.

For modest hardware:

Phi-3 Mini (3.8B): excellent for size.
Gemma 2B: very light.
TinyLlama 1.1B: experimentation only.

Privacy and Data

LM Studio runs everything locally:

Models downloaded and stored on disk.
Chats stored in ~/.cache/lm-studio/.
RAG documents stay local.
Optional telemetry for analytics (check settings).
No mandatory cloud.

For sensitive data, it’s reasonable guarantee — nothing leaves your machine unless you enable it.

Conclusion

LM Studio is the best option for individuals wanting to explore local LLMs with polished UI. For teams, Ollama + OpenWebUI offers more flexibility. For production, neither — use vLLM or TGI. LM Studio occupies a specific but important niche: democratising local LLM access for non-technical users. Free and polished, it’s the obvious choice in its category. For people handling private data or wanting to experiment without paying for APIs, it’s worth downloading this afternoon.